Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuchen Zhang

Jack

IstGPT: LLM-based Anomaly Detection for Spatial-Temporal Graph in Industrial Systems

Jun 01, 2026

Yuchen Zhang, Ning Xi, Pengbin Feng, Shigang Liu, Jianfeng Ma, Yulong Shen, Yanan Sun, Xiaolin Zhou

Abstract:Industrial Internet systems face increasing threats from sophisticated industrial control system (ICS) attacks, resulting in critical safety incidents. However, existing tools exhibit limited effectiveness in real-time anomaly detection due to the complex dependencies among sensors and actuators. To tackle this, we present IstGPT, the first industrial anomaly detection tool based on LLMs and graph learning to provide real-time protection against a wide range of ICS attacks. IstGPT achieves fine-grained and precise modeling on spatial-temporal dependencies in industrial cyber-physical systems. It first leverages industrial multi-modal knowledge, including operational data, technical documents, and system diagrams, to extract sensor-actuator dependency graphs via multi-stage prompt engineering. Then, LLM-Optimation iteratively refines the graph based on node accuracy, edge consistency, and logical coherence. Finally, IstGPT integrated improved graph neural networks with an encoder-decoder architecture to detect anomalies via reconstruction errors. We evaluate IstGPT against 12 state-of-the-art baselines on 9 datasets, including 2 public, 6 simulated, and a real-world robotic arm dataset. IstGPT achieves the best F1-scores and eTaF1 (a newer time-aware metric) across nine datasets. We further discuss the feasibility of deploying IstGPT in real-world industrial scenarios.

Via

Access Paper or Ask Questions

GeoMag: Geometric-Aware Video Motion Magnification via State Space Model

May 28, 2026

Kecheng Han, Yuchen Zhang, Bingqing Liu, Boqiang Guo, Wenbin Zheng, Shiyuan Pei

Abstract:Video Motion Magnification (VMM) reveals imperceptible dynamics but often suffers from structural inconsistencies under complex geometric transformations. Existing learning-based methods generally face a trade-off between the limited global context of CNNs and the high computational cost of Transformers. In addition, current training protocols, largely dominated by simple linear motion, fail to capture the geometric and imaging complexities encountered in real-world videos. To address these issues, we propose GeoMag, a geometric-aware VMM framework built upon State Space Models to achieve globally consistent motion amplification with linear complexity. We further construct Geo-200K, a large-scale synthetic dataset that introduces rich geometric transformations together with sensor-realistic degradations, improving the diversity and realism of training signals. Extensive experiments on synthetic and real-world benchmarks show that GeoMag consistently outperforms prior methods in visual fidelity and computational efficiency, while producing fewer artifacts and better structural consistency.

* ICME 2026 Spotlight

Via

Access Paper or Ask Questions

REVEAL: Reference-Grounded Reasoning for Multimodal Manipulation Detection

May 27, 2026

Jun Zhou, Bingwen Hu, Yaxiong Wang, Zhedong Zheng, Yongzhen Wang, Yuchen Zhang, Ping Liu

Abstract:Multimodal manipulation detection aims to simultaneously identify forged image--text pairs and localize tampered regions, yet existing methods typically rely on memorizing isolated artifacts and struggle with imperceptible manipulation traces or domain shifts. Inspired by human comparative reasoning, we reformulate this task as a reference-grounded verification problem, where authenticity is assessed by comparing a query against retrieved authentic evidence. We propose REVEAL Reference-Enabled Verification for Evidence Analysis and Localization), a framework explicitly designed for this comparative paradigm. To support this paradigm, we construct a large-scale reference library comprising 170K authentic news image--text pairs featuring over 40K public figures. Technically, REVEAL employs a difference-aware fusion mechanism to capture fine-grained discrepancies between the query and retrieved evidence. Furthermore, we introduce a task-decoupled Mixture-of-Experts (MoE) architecture to jointly execute instance-level detection and fine-grained grounding, effectively mitigating optimization conflicts between these heterogeneous objectives. Extensive experiments demonstrate that REVEAL significantly outperforms state-of-the-art methods, and notably enables \emph{training-free domain adaptation} by simply updating the reference library, offering a robust and practical solution for detecting evolving misinformation. Code is available at https://anonymous.4open.science/r/REVEAL-Reference-A006.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning

May 07, 2026

Qianjia Cheng, Yuchen Zhang, Zhilin Wang, Yuxin Zuo, Shunkai Zhang, Yuchen Fan, Yu Qiao, Bowen Zhou, Ning Ding, Yu Cheng(+2 more)

Abstract:Tool-integrated reasoning (TIR) offers a direct way to extend thinking models beyond the limits of text-only reasoning. Paradoxically, we observe that tool-enabled evaluation can degrade reasoning performance even when the strong thinking models make almost no actual tool calls. In this paper, we investigate how to inject natural tool-use behavior into a strong thinking model without sacrificing its no-tool reasoning ability, and present a comprehensive TIR recipe. We highlight that (i) the effectiveness of TIR supervised fine-tuning (SFT) hinges on the learnability of teacher trajectories, which should prioritize problems inherently suited for tool-augmented solutions; (ii) controlling the proportion of tool-use trajectories could mitigate the catastrophic forgetting of text-only reasoning capacity; (iii) optimizing for pass@k and response length instead of training loss could maximize TIR SFT gains while preserving headroom for reinforcement learning (RL) exploration; (iv) a stable RL with verifiable rewards (RLVR) stage, built upon suitable SFT initialization and explicit safeguards against mode collapse, provides a simple yet remarkably effective solution. When applied to Qwen3 thinking models at 4B and 30B scales, our recipe yields models that achieve state-of-the-art performance in a wide range of benchmarks among open-source models, such as 96.7% and 99.2% on AIME 2025 for 4B and 30B, respectively.

Via

Access Paper or Ask Questions

Beyond Known Objects: A Novel Framework for Open-Set Object Detection using Negative-Aware Norm

May 04, 2026

Yuchen Zhang, Yao Lu, Johannes Betz

Abstract:Open-Set Object Detection (OSOD) is crucial for autonomous driving, where perception systems must recognize and localize both known and previously unseen objects in complex, dynamic environments. While recent approaches deliver promising results, they often require retraining the detector extensively to learn objectness, which describes the likelihood that a bounding box tightly encloses a valid object, regardless of whether its category was learned during training. Deviating from existing work, we hypothesize that standard off-the-shelf detectors may already contain helpful cues for objectness, owing to their training on numerous and diverse known categories. Building on this idea, we propose NAN-SPOT, a training-light framework that does not require to retrain the base object detector and estimates objectness by leveraging a hidden layer metric called Negative-Aware Norm (NAN), requiring only minutes of training on just hundreds of images. To support comprehensive evaluation, we introduce COCO-Open, an expanded version of the existing COCO-Mixed dataset, increasing unknown object annotations from 433 to 1853, making it the most exhaustively labeled dataset for OSOD to the best of our knowledge. Experimental results demonstrate that NAN-SPOT achieves even better performance on unknown object detection than methods requiring heavy training, without compromising performance on known objects. This efficiency and robustness make NAN-SPOT a promising step towards open-world perception in autonomous driving.

* Submitted to the IEEE Intelligent Vehicles Symposium (IV 2026), Detroit, MI, United States

Via

Access Paper or Ask Questions

Quantization-Aware EE Optimization and SE-EE Tradeoff for MiLAC-Aided MU-MISO Beamforming

Apr 27, 2026

Yuchen Zhang, Pinjun Zheng, Tareq Y. Al-Naffouri

Abstract:In large antenna arrays, hardware power consumption becomes a dominant design constraint, making energy efficiency (EE) a first-class objective alongside spectral efficiency (SE). Microwave linear analog computer (MiLAC)-aided beamforming, whose front end is a passive reciprocal stream-to-antenna network, addresses this tension by reducing the active radio-frequency chain count to the stream number, at a moderate SE cost. Despite this promise, no EE optimization framework has been established for MiLAC-aided beamforming that accounts for digital-to-analog converter quantization noise and post-quantized transmit power. We fill this gap for downlink multiuser multiple-input single-output (MU-MISO) systems by formulating quantization-aware EE maximization over the MiLAC-feasible beamformer and characterizing the resulting SE-EE tradeoff. Three contributions follow. First, we prove a row-space optimality property of the effective MiLAC-aided beamformer, yielding an equivalent reduced-dimension reformulation whose complexity scales with the stream number rather than the antenna number. Second, we develop a low-complexity Dinkelbach-weighted minimum mean-square error algorithm aided by projected gradient descent that is guaranteed to converge to a stationary point. Third, we cast the SE-EE tradeoff as a multi-objective problem and trace its Pareto boundary via a weighted-sum method that combines an alternative reduced-dimension coordinate with auxiliary-variable successive convex approximation, yielding convex per-iteration subproblems with guaranteed convergence. Numerical results on a DeepMIMO v4 deployment show MiLAC-aided beamforming substantially improves EE over digital and hybrid benchmarks at a moderate SE cost and significantly expands the achievable SE-EE operating region.

* This paper has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Tri-Hybrid Beamforming Design for ISAC Systems with Reconfigurable Antennas

Apr 22, 2026

Jiangong Chen, Xia Lei, Yuchen Zhang, Kaitao Meng, Christos Masouros

Abstract:Integrated Sensing and Communication (ISAC) systems require efficient beamforming architectures to jointly support communication and sensing functionalities. To reduce hardware overhead, Hybrid Beamforming (HBF) has been widely studied and shown to achieve performance close to fully digital beamforming under practical hardware constraints. As a promising evolution, Reconfigurable Antenna (RA) technologies have recently emerged to further enhance beamforming Degrees of Freedom (DoFs) by dynamically reconfiguring antenna Electromagnetic(EM) characteristics, yet their integration into ISAC systems remains largely unexplored. In this paper, we investigate an RA-assisted ISAC system and develop a decoupled Triple-Hybrid Beamforming (Tri-HBF) framework that alternatively optimizes digital, analog, and EM beamformers to maximize the communication rate and sensing Signal-to-Clutter-plus-NoiseRatio (SCNR). For both Single-user Single-target (SUST) and Multiple-user Multiple-target (MUMT) scenarios, we first transform the original fractional objectives into fraction-free ones via methods tailored to their respective structures. The resulting problems are then solved via alternating optimization over different variable blocks. Closed-form updates are derived for all variables except the EM beamforming subproblem in the MUMT scenario. To further reduce the complexity introduced by Semidefinite Relaxation (SDR) in EM beamforming, we propose a low-complexity iterative approach across antennas with closed-form updates. Simulation results demonstrate that the proposed scheme significantly outperforms benchmark designs with conventional omnidirectional and directional antennas, achievingalmost 100% improvement in spectrum efficiency and 62.5% reduction in antenna overhead, thereby unveiling the

Via

Access Paper or Ask Questions

Near-Field Localization via Reconfigurable Antennas

Mar 17, 2026

Alireza Fadakar, Yuchen Zhang, Hui Chen, Musa Furkan Keskin, Henk Wymeersch, Andreas F. Molisch

Abstract:Reconfigurable antennas (RAs) utilize the electromagnetic (EM) domain to provide dynamic control over antenna radiation patterns, which offers an effective way to enhance power efficiency in wireless links. Unlike conventional arrays with fixed element patterns, RAs enable on-demand beam-pattern synthesis by directly controlling each antenna's EM characteristics. While existing research on RAs has primarily focused on improving spectral efficiency, this paper explores their application for downlink localization. Moreover, the majority of existing works focus on far-field scenarios with little attention on near-field (NF). Motivated by these gaps, we consider a synthesis model in which each antenna generates desired beampatterns from a finite set of EM basis functions. We then formulate a joint optimization problem for the baseband (BB) and EM precoders with the objective of minimizing the user equipment (UE) position error bound (PEB) in NF conditions. Our analytical derivations and extensive simulation results demonstrate that the proposed hybrid precoder design for RAs significantly improves UE positioning accuracy compared to traditional non-reconfigurable arrays.

* IEEE International Conference on Communications (ICC) 2026
* 6 pages, 5 figures, Accepted for publication in the 2026 IEEE International Conference on Communications (ICC)

Via

Access Paper or Ask Questions

How Far Can Unsupervised RLVR Scale LLM Training?

Mar 09, 2026

Bingxiang He, Yuxin Zuo, Zeyuan Liu, Shangziqi Zhao, Zixuan Fu, Junlin Yang, Cheng Qian, Kaiyan Zhang, Yuchen Fan, Ganqu Cui(+11 more)

Abstract:Unsupervised reinforcement learning with verifiable rewards (URLVR) offers a pathway to scale LLM training beyond the supervision bottleneck by deriving rewards without ground truth labels. Recent works leverage model intrinsic signals, showing promising early gains, yet their potential and limitations remain unclear. In this work, we revisit URLVR and provide a comprehensive analysis spanning taxonomy, theory and extensive experiments. We first classify URLVR methods into intrinsic versus external based on reward sources, then establish a unified theoretical framework revealing that all intrinsic methods converge toward sharpening the model's initial distribution This sharpening mechanism succeeds when initial confidence aligns with correctness but fails catastrophically when misaligned. Through systematic experiments, we show intrinsic rewards consistently follow a rise-then-fall pattern across methods, with collapse timing determined by model prior rather than engineering choices. Despite these scaling limits, we find intrinsic rewards remain valuable in test-time training on small datasets, and propose Model Collapse Step to measure model prior, serving as a practical indicator for RL trainability. Finally, we explore external reward methods that ground verification in computational asymmetries, showing preliminary evidence they may escape the confidence-correctness ceiling. Our findings chart boundaries for intrinsic URLVR while motivating paths toward scalable alternatives.

* Accepted to the ICLR 2026

Via

Access Paper or Ask Questions

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Mar 02, 2026

Yuchen Zhang, Yaxiong Wang, Kecheng Han, Yujiao Wu, Lianwei Wu, Li Zhu, Zhedong Zheng

Abstract:Recent advances in generative AI have significantly enhanced the realism of multimodal media manipulation, thereby posing substantial challenges to manipulation detection. Existing manipulation detection and grounding approaches predominantly focus on manipulation type classification under result-oriented supervision, which not only lacks interpretability but also tends to overfit superficial artifacts. In this paper, we argue that generalizable detection requires incorporating explicit forensic reasoning, rather than merely classifying a limited set of manipulation types, which fails to generalize to unseen manipulation patterns. To this end, we propose REFORM, a reasoning-driven framework that shifts learning from outcome fitting to process modeling. REFORM adopts a three-stage curriculum that first induces forensic rationales, then aligns reasoning with final judgments, and finally refines logical consistency via reinforcement learning. To support this paradigm, we introduce ROM, a large-scale dataset with rich reasoning annotations. Extensive experiments show that REFORM establishes new state-of-the-art performance with superior generalization, achieving 81.52% ACC on ROM, 76.65% ACC on DGM4, and 74.9 F1 on MMFakeBench.

Via

Access Paper or Ask Questions